智能论文笔记

Investigating myocardial infarction and its effects in patients with urgent medical problems using advanced data mining tools

Tanya Aghazadeh , Mostafa Bagheri

分类：机器学习

2021-12-15

在医学科学中，在不同疾病上收集多个数据非常重要，并且数据最重要的目标是调查疾病。心肌梗死是死亡率的严重危险因素，并且在以往的研究中，主要重点是通过人口统计学特征，超声心动图和心电图测量心肌梗死的可能性。相反，本研究的目的是利用数据分析算法，并比较他们的心脏病发作患者的准确性，以便通过考虑到应急行动并因此预测心肌梗死期间心肌梗死期间的心肌强度。为此目的，通过数据分析的分类技术收集和研究，包括随机的分类技术，包括随机的分类技术来收集和研究，包括年龄，紧急操作时间，肌酸磷酸氨基酶（CPK）试验，心率，血糖和静脉的105名心肌梗死患者。决策林，决策树，支持向量机（SVM），k离邻居和序数逻辑回归。最后，在平均评估指标方面，选择了精度为76％的随机决定林的模型作为最佳模型。此外，肌酸磷酸氨基酶试验，尿素，白色和红细胞计数，血糖，时间和血红蛋白的七种特征被鉴定为喷射分数变量的最有效特征。

translated by 谷歌翻译

DSI++: Updating Transformer Memory with New Documents

Sanket Vaibhav Mehta , Jai Gupta , Yi Tay , Mostafa Dehghani , Vinh Q. Tran , Jinfeng Rao , Marc Najork , Emma Strubell , Donald Metzler

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-19

Differentiable Search Indices (DSIs) encode a corpus of documents in the parameters of a model and use the same model to map queries directly to relevant document identifiers. Despite the strong performance of DSI models, deploying them in situations where the corpus changes over time is computationally expensive because reindexing the corpus requires re-training the model. In this work, we introduce DSI++, a continual learning challenge for DSI to incrementally index new documents while being able to answer queries related to both previously and newly indexed documents. Across different model scales and document identifier representations, we show that continual indexing of new documents leads to considerable forgetting of previously indexed documents. We also hypothesize and verify that the model experiences forgetting events during training, leading to unstable learning. To mitigate these issues, we investigate two approaches. The first focuses on modifying the training dynamics. Flatter minima implicitly alleviate forgetting, so we optimize for flatter loss basins and show that the model stably memorizes more documents (+12\%). Next, we introduce a generative memory to sample pseudo-queries for documents and supplement them during continual indexing to prevent forgetting for the retrieval task. Extensive experiments on novel continual indexing benchmarks based on Natural Questions (NQ) and MS MARCO demonstrate that our proposed solution mitigates forgetting by a significant margin. Concretely, it improves the average Hits@10 by $+21.1\%$ over competitive baselines for NQ and requires $6$ times fewer model updates compared to re-training the DSI model for incrementally indexing five corpora in a sequence.

translated by 谷歌翻译

Adaptive Uncertainty Distribution in Deep Learning for Unsupervised Underwater Image Enhancement

Alzayat Saleh , Marcus Sheaves , Dean Jerry , Mostafa Rahimi Azghadi

分类：计算机视觉

2022-12-18

One of the main challenges in deep learning-based underwater image enhancement is the limited availability of high-quality training data. Underwater images are difficult to capture and are often of poor quality due to the distortion and loss of colour and contrast in water. This makes it difficult to train supervised deep learning models on large and diverse datasets, which can limit the model's performance. In this paper, we explore an alternative approach to supervised underwater image enhancement. Specifically, we propose a novel unsupervised underwater image enhancement framework that employs a conditional variational autoencoder (cVAE) to train a deep learning model with probabilistic adaptive instance normalization (PAdaIN) and statistically guided multi-colour space stretch that produces realistic underwater images. The resulting framework is composed of a U-Net as a feature extractor and a PAdaIN to encode the uncertainty, which we call UDnet. To improve the visual quality of the images generated by UDnet, we use a statistically guided multi-colour space stretch module that ensures visual consistency with the input image and provides an alternative to training using a ground truth image. The proposed model does not need manual human annotation and can learn with a limited amount of data and achieves state-of-the-art results on underwater images. We evaluated our proposed framework on eight publicly-available datasets. The results show that our proposed framework yields competitive performance compared to other state-of-the-art approaches in quantitative as well as qualitative metrics. Code available at https://github.com/alzayats/UDnet .

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

On Text-based Personality Computing: Challenges and Future Directions

Qixiang Fang , Anastasia Giachanou , Ayoub Bagheri , Laura Boeschoten , Erik-Jan van Kesteren , Mahdi Shafiee Kamalabad , Daniel L Oberski

分类：自然语言处理

2022-12-13

Text-based personality computing (TPC) has gained many research interests in NLP. In this paper, we describe 15 challenges that we consider deserving the attention of the research community. These challenges are organized by the following topics: personality taxonomies, measurement quality, datasets, performance evaluation, modelling choices, as well as ethics and fairness. When addressing each challenge, not only do we combine perspectives from both NLP and social sciences, but also offer concrete suggestions towards more valid and reliable TPC research.

translated by 谷歌翻译

Modelling Stance Detection as Textual Entailment Recognition and Leveraging Measurement Knowledge from Social Sciences

Qixiang Fang , Anastasia Giachanou , Ayoub Bagheri

分类：自然语言处理

2022-12-13

Stance detection (SD) can be considered a special case of textual entailment recognition (TER), a generic natural language task. Modelling SD as TER may offer benefits like more training data and a more general learning scheme. In this paper, we present an initial empirical analysis of this approach. We apply it to a difficult but relevant test case where no existing labelled SD dataset is available, because this is where modelling SD as TER may be especially helpful. We also leverage measurement knowledge from social sciences to improve model performance. We discuss our findings and suggest future research directions.

translated by 谷歌翻译

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Aran Komatsuzaki , Joan Puigcerver , James Lee-Thorp , Carlos Riquelme Ruiz , Basil Mustafa , Joshua Ainslie , Yi Tay , Mostafa Dehghani , Neil Houlsby

分类：机器学习 | 自然语言处理 | 计算机视觉

2022-12-09

Training large, deep neural networks to convergence can be prohibitively expensive. As a result, often only a small selection of popular, dense models are reused across different contexts and tasks. Increasingly, sparsely activated models, which seek to decouple model size from computation costs, are becoming an attractive alternative to dense models. Although more efficient in terms of quality and computation cost, sparse models remain data-hungry and costly to train from scratch in the large scale regime. In this work, we propose sparse upcycling -- a simple way to reuse sunk training costs by initializing a sparsely activated Mixture-of-Experts model from a dense checkpoint. We show that sparsely upcycled T5 Base, Large, and XL language models and Vision Transformer Base and Large models, respectively, significantly outperform their dense counterparts on SuperGLUE and ImageNet, using only ~50% of the initial dense pretraining sunk cost. The upcycled models also outperform sparse models trained from scratch on 100% of the initial dense pretraining computation budget.

translated by 谷歌翻译

Segment-based fusion of multi-sensor multi-scale satellite soil moisture retrievals

Reza Attarzadeh , Hossein Bagheri , Iman Khosravi , Saeid Niazmardi , Davood Akbarid

分类：人工智能

2022-11-29

Synergetic use of sensors for soil moisture retrieval is attracting considerable interest due to the different advantages of different sensors. Active, passive, and optic data integration could be a comprehensive solution for exploiting the advantages of different sensors aimed at preparing soil moisture maps. Typically, pixel-based methods are used for multi-sensor fusion. Since, different applications need different scales of soil moisture maps, pixel-based approaches are limited for this purpose. Object-based image analysis employing an image object instead of a pixel could help us to meet this need. This paper proposes a segment-based image fusion framework to evaluate the possibility of preparing a multi-scale soil moisture map through integrated Sentinel-1, Sentinel-2, and Soil Moisture Active Passive (SMAP) data. The results confirmed that the proposed methodology was able to improve soil moisture estimation in different scales up to 20% better compared to pixel-based fusion approach.

translated by 谷歌翻译

Towards Developing Safety Assurance Cases for Learning-Enabled Medical Cyber-Physical Systems

Maryam Bagheri , Josephine Lamp , Xugui Zhou , Lu Feng , Homa Alemzadeh

分类：机器学习 | 人工智能

2022-11-23

Machine Learning (ML) technologies have been increasingly adopted in Medical Cyber-Physical Systems (MCPS) to enable smart healthcare. Assuring the safety and effectiveness of learning-enabled MCPS is challenging, as such systems must account for diverse patient profiles and physiological dynamics and handle operational uncertainties. In this paper, we develop a safety assurance case for ML controllers in learning-enabled MCPS, with an emphasis on establishing confidence in the ML-based predictions. We present the safety assurance case in detail for Artificial Pancreas Systems (APS) as a representative application of learning-enabled MCPS, and provide a detailed analysis by implementing a deep neural network for the prediction in APS. We check the sufficiency of the ML data and analyze the correctness of the ML-based prediction using formal verification. Finally, we outline open research problems based on our experience in this paper.

translated by 谷歌翻译

Automated Deep Aberration Detection from Chromosome Karyotype Images

Zahra Shamsi , Drew Bryant , Jacob Wilson , Xiaoyu Qu , Avinava Dubey , Konik Kothari , Mostafa Dehghani , Mariya Chavarha , Valerii Likhosherstov , Brian Williams

分类：计算机视觉 | 机器学习

2022-11-20

Chromosome analysis is essential for diagnosing genetic disorders. For hematologic malignancies, identification of somatic clonal aberrations by karyotype analysis remains the standard of care. However, karyotyping is costly and time-consuming because of the largely manual process and the expertise required in identifying and annotating aberrations. Efforts to automate karyotype analysis to date fell short in aberration detection. Using a training set of ~10k patient specimens and ~50k karyograms from over 5 years from the Fred Hutchinson Cancer Center, we created a labeled set of images representing individual chromosomes. These individual chromosomes were used to train and assess deep learning models for classifying the 24 human chromosomes and identifying chromosomal aberrations. The top-accuracy models utilized the recently introduced Topological Vision Transformers (TopViTs) with 2-level-block-Toeplitz masking, to incorporate structural inductive bias. TopViT outperformed CNN (Inception) models with >99.3% accuracy for chromosome identification, and exhibited accuracies >99% for aberration detection in most aberrations. Notably, we were able to show high-quality performance even in "few shot" learning scenarios. Incorporating the definition of clonality substantially improved both precision and recall (sensitivity). When applied to "zero shot" scenarios, the model captured aberrations without training, with perfect precision at >50% recall. Together these results show that modern deep learning models can approach expert-level performance for chromosome aberration detection. To our knowledge, this is the first study demonstrating the downstream effectiveness of TopViTs. These results open up exciting opportunities for not only expediting patient results but providing a scalable technology for early screening of low-abundance chromosomal lesions.

translated by 谷歌翻译